Appendix A

CASE‐STUDY 1: NORTHERN AUSTRALIA

Authors
Affiliations

Julie Vercelloni

Australian Institute of Marine Science, Townsville, Australia

Centre for Data Science, Queensland University of Technology, Brisbane, Australia

Murray Logan

Australian Institute of Marine Science, Townsville, Australia

Andrew Zammit‐Mangion

School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, Australia

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

Matthew Sainsbury‐Dale

School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, Australia

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

Britta Schaffelke

Australian Institute of Marine Science, Townsville, Australia

Kerrie Mengersen

Centre for Data Science, Queensland University of Technology, Brisbane, Australia

Manuel González‐Rivero

Australian Institute of Marine Science, Townsville, Australia

Important

To be displayed on a big screen.

Coral cover predictions at multiple scales

Figure 1: Estimated coral trends at the data-tier level from 2006 to 2024. Black lines represent mean predicted coral cover, with shaded areas indicating predictive intervals. Dots show observed coral cover averaged at the tier level, which were used as the response variable in the spatio-temporal model.
Figure 2: Examples of estimated coral trends at predictive-tiers from 2006 to 2024. Black lines represent mean predicted coral cover, with shaded areas indicating predictive intervals.
Figure 3: Predicted coral cover values across the Wet Tropics for each year between 2006-2024. Values correspond to the posterior means.
Figure 4: Uncertainty associated with predictions across the Wet Tropics for each year between 2006-2024. Values correspond to the posterior credible intervals.

Model validation diagnostics

Model fit

Figure 7: Model goodness-of-fit. Coral cover observations were used to train the spatio-temporal model. The R² statistic measures how well the predicted values reproduce the observations, with values closer to 1 indicating a better fit along the 1:1 red dotted line.

Model residuals diagnostics

Leave-out-data analysis

The leave-out data approach is used to evaluate the influence of specific observations using prediction-performance measures. The full model is fitted on a train dataset composed of a random sample of observations and prediction-performance measures are computed on the leave-out observations.

We found that the predictive performance of the spatio-temporal model is more sensitive to the number of monitoring locations, with accuracy declining when sites or reefs are removed from the dataset (Figure 8). In contrast, model performance is less affected by the number of replicated years.

Five validation tests are developed:

    1. rm(20% obs): 20% of data were randomly removed without structure.
    1. rm(20% reef): 20% of reefs were randomly removed.
    1. rm(20% site): 20% of sites were randomly removed.
    1. rm(20% transect): 20% of transects were randomly removed.
    1. rm(3YRS): 3 years of observations were removed within each location (locations with less than 4 years of data were not used).

Four predictive measures are used:

  • 95% coverage interval (CvgErr): evaluates how often predictions include true observations, with the goal of capturing the true values 95% of the time. It is estimated as follows:

\[ \text{CvgErr}(z, \ell, u) \;=\; \left| 0.95 \;-\; \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}\!\left( \ell_i < z_i < u_i \right) \right| \]

where \(z = \{z_1, z_2, \dots, z_n\}\) are the coral cover observations, \(\ell\) and \(u\) are the lower and upper bounds of the predictive intervals, \(n\) the total number of predictions, and \(\mathbf{1}(\cdot)\) is the indicator function (1 if the condition is true, 0 otherwise).

  • 95% interval score (IS): rewards prediction intervals that include the true observations (accuracy) and penalizes those that are too narrow or too wide (precision). It is computed as follow:

\[ \text{IS}_{95} \;=\; \frac{1}{n} \sum_{i=1}^{n} \Bigg[ (u_i - \ell_i) + \frac{2}{\alpha} (\ell_i - y_i)\,\mathbf{1}(y_i < \ell_i) + \frac{2}{\alpha} (y_i - u_i)\,\mathbf{1}(y_i > u_i) \Bigg] \]

where \(\alpha = 0.05\), \(\ell\) and \(u\) are the lower and upper bounds of the predictive intervals, \(n\) the total number of predictions, and \(y\) are observed coral cover.

  • Root-mean-squared prediction error (RMSPE) - how far off model predictions are from true observations without considering for uncertainty.

\[ \text{RMSPE} \;=\; \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 } \]

where \(y\) and \(\hat{y}\) are the observed and predicted coral cover values, respectively, and \(n\) the total number of observations.

  • Continuous Ranked Probability Score (CRPS) - represents the quality of the predictions over the entire predictive probability distribution penalizing predictions that are inaccurate, imprecise or overconfident.

\[ \text{CRPS}(F, y) \;=\; \sigma \left[ z \left( 2 \Phi(z) - 1 \right) \;+\; 2 \,\phi(z) \;-\; \frac{1}{\sqrt{\pi}} \right], \quad z = \frac{y - \mu}{\sigma} \]

where \(y\) is the observed coral cover values, \(\mu\) and \(\sigma\) are the mean and the standard deviation of the predictive normal distribution, \(\phi(.)\) represented the standard normal probability density function and \(\Phi\) the cumulative distribution function.

These predictive measures give a single number with low scores representing better performances.

Figure 8: Metrics of model predictive performance by tests. Each axis represents one of the evaluation metrics described above. The length of each arm represents the model predictive performance on that metric for a given test, with values scaled from 1 (poor) to 0 (best). Overall, better predictive performances of the models are indicated by smaller polygons.

Basis function exploratory analysis

The aim of this analysis is to explore the influence of the basis function formulation with a focus on the temporal dimension. To do this, we compare four model performance using different number of basis functions in the temporal basis function:

  • Full: number and location of temporal basis functions automatically estimated from the FRK function and adopted by ReefCloud.
  • 5: five temporal basis functions randomly selected from the FRK auto basis function.
  • 3: three temporal basis functions randomly selected from the FRK auto basis function.
  • 1: one temporal basis function randomly selected from the FRK auto basis function.

We found that the estimation of regional trends (Figure 10) is more strongly influenced by the number of temporal basis functions, whereas the attribution of coral loss (Figure 11) is less affected.

Figure 9: Locations of the basis functions for each model with a) Default, b) 5, c) 3 and d) 1.
Figure 10: Regional trends predicted by the spatio-temporal models. Solid lines represent mean predicted coral cover, with shaded areas indicating predictive intervals.
Figure 11: Estimated effect sizes of disturbance effects. The points represent the estimated effect and the intervals represent the corresponding 95% confidence intervals.